Applied Large Language Models

Zach Dickson

Fellow in Quantitative Methodology
London School of Economics

Schedule

Trulli

  • A Brief Introduction to Large Language Models (LLMs) (50 minutes)
    • Word embeddings vs. LLMs
    • Pre-trained models (BERT, GPT)
    • Applications in the social sciences

10 minute break

  • Applied Example: Text Classification
    • Fine-tune a transformer model to predict issue topics
    • Validation and verification

1 hour lunch

  • Applied Example: Topic Modeling & Text Clustering
    • Extract issue topics from social media messages

10 minute break

  • Everything else
    • State-of-the-art applications
    • Validating our models
    • Limitations
    • Future applications

My Background & Research Interests

What are Large Language Models (LLMs)?

  • A language model is a machine learning model that intends to predict the next word in a sentence given the previous words.
    • Example: Autocomplete on your phone
  • These models work by estimating the probability of a token (e.g. word), or a sequence of tokens, given the context of the sentence.
    • Example: “The cat is on the ___”
      • cup: 2.3%
      • mat: 8.9%
      • computer: 1.2%
      • coffee: 0.9%
    • The model predicts the next word is “mat” with the highest probability.
    • A sequence of tokens could be a sentence, paragraph, or entire document.

What are Large Language Models (LLMs)?

  • Modeling human language is very complex
    • Syntax, semantics, pragmatics, etc.
    • Context, ambiguity, and nuance
    • Cultural and social norms
  • As models get larger, they can capture more of these complexities
    • More parameters, more data, more context
    • Better at predicting the next word in a sentence
    • Better at understanding the meaning of words and sentences

Transformers

  • Transformers are a type of neural network architecture that has revolutionized natural language processing (NLP).
  • Transformers consist of an encoder and a decoder
    • The encoder processes the input sequence and produces a sequence of hidden states
    • The decoder takes the hidden states produced by the encoder and generates the output sequence

Transformers Architecture

Trulli

Attention Mechanism

  • Attention is a mechanism that allows the model to focus on different parts of the input sequence when making predictions.
    • The model can learn which parts of the input are most important for making predictions.
    • This allows the model to capture long-range dependencies in the data.
    • The model can also learn to focus on different parts of the input depending on the context of the sentence.

Attention Mechanism

Attention Mechanism

How do LLMs generate text?

  • LLMs generate text by sampling from the probability distribution over the vocabulary at each time step.
    • The model predicts the next word in the sequence by sampling from the distribution over the vocabulary.
    • The model can generate text one word at a time, or it can generate multiple words at once.
    • The model can also generate text conditioned on a specific input, such as a prompt or a context.
    • The model can generate text that is coherent and grammatically correct, but it can also generate text that is nonsensical or incoherent.
  • Example:
    • “My dog, Max, knows how to perform many traditional dog tricks. _______”
      • 2.3%: “For example, he can sit, stay, and roll over.”
      • 2.1%: “He can also fetch a ball, and he loves to play with his toys.”

Pre-trained Models

  • Pre-trained models are large language models that have been trained on a large amount of text data
    • Trained on a large corpus of text data, such as Wikipedia, news articles, and books
    • Unsupervised learning, which means it does not require labeled data
    • Trained for a long time, often several days or weeks
  • Pre-trained models can be fine-tuned on a specific task or dataset
    • Fine-tuning involves updating the parameters of the pre-trained model on a smaller dataset that is specific to the task
    • Fine-tuning allows the model to learn the specific patterns and relationships in the data

Pre-trained Models

  • Pre-trained models example: BERT (Bi-directional Encoder Representations from Transformers)
    • Introduced by Devlin et al. (2018)
    • Pre-trained on a large corpus of text data, such as Wikipedia and news articles
    • Fine-tuned on specific tasks, such as question answering, text classification, and named entity recognition

BERT Example

from transformers import pipeline
unmasker = pipeline('fill-mask', model='bert-base-uncased')
unmasker("Hello I'm a [MASK] model.")

[{'sequence': "[CLS] hello i'm a fashion model. [SEP]",
  'score': 0.1073106899857521,
  'token': 4827,
  'token_str': 'fashion'},
 {'sequence': "[CLS] hello i'm a role model. [SEP]",
  'score': 0.08774490654468536,
  'token': 2535,
  'token_str': 'role'},
 {'sequence': "[CLS] hello i'm a new model. [SEP]",
  'score': 0.05338378623127937,
  'token': 2047,
  'token_str': 'new'},
 {'sequence': "[CLS] hello i'm a super model. [SEP]",
  'score': 0.04667217284440994,
  'token': 3565,
  'token_str': 'super'},
 {'sequence': "[CLS] hello i'm a fine model. [SEP]",
  'score': 0.027095865458250046,
  'token': 2986,
  'token_str': 'fine'}] 

What else can we do with LLMs?

  • The real ‘magic’ of LLMs come in the way they convert text to numbers and back again.
    • This allows us to use them in a wide range of applications, such as:
      • Text classification
      • Text generation
      • Named entity recognition
      • Question answering
      • Machine translation
      • Sentiment analysis
      • Summarization
      • And many more!

A Framework for applications

  • Regression
    • Predicting a continuous variable (e.g. stock prices, house prices, ideology scores)
  • Classification
    • Predicting a categorical variable (e.g. sentiment, topic, party affiliation)
  • Clustering
    • Grouping similar data points together (e.g. customer segmentation, topic modeling)
  • Generation
    • Generating text based on a prompt or context (e.g. chatbots, creative writing)

Let’s take a break

Applied Example: Text Classification

  • In this section, we will fine-tune a transformer model for text classification.
    • We will use the Hugging Face Transformers library to fine-tune a BERT transformer model on a dataset of BBC News articles.
    • We will train the model to predict the category of the news article based on the text of the article.
    • We will evaluate the model on a test set and analyze the results.

Google Colab Notebook 1

  • Applied Example: Text Classification

Google Colab Notebook 2

  • Applied Example: Topic Modeling & Text Clustering

Tying it all together

  • Language models are a powerful tool for analyzing and generating text data.
    • They can be used for a wide range of applications, from text classification to text generation.
    • They can be fine-tuned on specific tasks or datasets to improve performance.
    • They can be used to generate text that is coherent and grammatically correct.
    • They can be used to analyze and understand text data in new and innovative ways.

Validation is Crucial

  • Validation
    • Cross-validation
    • Train-test split
    • Hyperparameter tuning
    • Model evaluation metrics

Model Evaluation Metrics

  • Accuracy
    • The proportion of correct predictions out of the total number of predictions
  • Precision
    • The proportion of true positive predictions out of the total number of positive predictions
  • Recall
    • The proportion of true positive predictions out of the total number of actual positives
  • F1 Score
    • The harmonic mean of precision and recall
    • F1 = 2 * (precision * recall) / (precision + recall)

Confusion Matrix

  • A confusion matrix is a table that is often used to describe the performance of a classification model on a set of test data for which the true values are known.

Confusion Matrix

Advanced applications in the Social Sciences

Identifying instrumental variables in observational studies

Mining causality: Ai-assisted search for instrumental variables. Han (2024)

Han (2024)

Predicting Conflict Intensity

Introducing an Interpretable Deep Learning Approach to Domain-Specific Dictionary Creation: A Use Case for Conflict Prediction Häffner et al. (2023)

We train the neural networks on a corpus of conflict reports and match them with conflict event data. This corpus consists of over 14,000 expert-written International Crisis Group (ICG) CrisisWatch reports between 2003 and 2021

Predicting Conflict Intensity

Microtargeting and Political Persuasion

Evaluating the persuasive influence of political microtargeting with large language models Hackenburg & Margetts (2024)

[…] we integrate user data into GPT-4 prompts in real-time, facilitating the live creation of messages tailored to persuade individual users on political issues. We then deploy this application at scale to test whether personalized, microtargeted messaging offers a persuasive advantage compared to nontargeted messaging.

Microtargeting and Political Persuasion

Synthetic Data Generation

Synthetically generated text for supervised text analysis Halterman (2025)

This article proposes using LLMs to generate synthetic training data for training smaller, traditional supervised text models.

Synthetic Data Generation

Scaling Political Texts

Positioning Political Texts with Large Language Models by Asking and Averaging Le Mens & Gallego (2025)

We ask an LLM where a tweet or a sentence of a political text stands on the focal dimension and take the average of the LLM responses to position political actors such as US Senators, or longer texts such as UK party manifestos or EU policy speeches given in 10 different languages.

Scaling Political Texts

Limitations & Ethical Considerations


Not a panacea

Limitations

Bisbee et al. (2023)

Ethical Considerations in LLMs

  • Data Bias
    • Models can learn biases present in the training data
    • Biases can be harmful and perpetuate stereotypes
  • Data Privacy
    • Models can memorize sensitive information present in the training data
    • Models can be used to deanonymize data
  • Environmental Impact
    • Training large language models requires a lot of computational resources
    • This can have a significant environmental impact

References